Goto

Collaborating Authors

 performance matrix


Appendix: CGLB: BenchmarkTasksfor ContinualGraphLearning

Neural Information Processing Systems

Moreover, the 47-th class of Products-CL contains only one node, and cannot be split for training,validation,andtest. We provide the set of splitting used in our experiments onourGitHub pageasareference. Thentheselected model is also automatically evaluated on the testing set. Details on the usage can be found in our GitHubpage. The name of the hyper-parameters are consistent with the names in our code.


Latent Traits and Cross-Task Transfer: Deconstructing Dataset Interactions in LLM Fine-tuning

Krishna, Shambhavi, Naik, Atharva, Agarwal, Chaitali, Govindan, Sudharshan, Lee, Taesung, Chang, Haw-Shiuan

arXiv.org Artificial Intelligence

Large language models are increasingly deployed across diverse applications. This often includes tasks LLMs have not encountered during training. This implies that enumerating and obtaining the high-quality training data for all tasks is infeasible. Thus, we often need to rely on transfer learning using datasets with different characteristics, and anticipate out-of-distribution requests. Motivated by this practical need, we propose an analysis framework, building a transfer learning matrix and dimensionality reduction, to dissect these cross-task interactions. We train and analyze 10 models to identify latent abilities (e.g., Reasoning, Sentiment Classification, NLU, Arithmetic) and discover the side effects of the transfer learning. Our findings reveal that performance improvements often defy explanations based on surface-level dataset similarity or source data quality. Instead, hidden statistical factors of the source dataset, such as class distribution and generation length proclivities, alongside specific linguistic features, are actually more influential. This work offers insights into the complex dynamics of transfer learning, paving the way for more predictable and effective LLM adaptation.


Appendix: CGLB: Benchmark Tasks for Continual Graph Learning

Neural Information Processing Systems

The other 30 classes of Aromaticity-CL are kept and constructed as 15 tasks. The name of the hyper-parameters are consistent with the names in our code. For the two multi-label classification datasets (SIDER-tIL and Tox21-tIL), early stopping is applied to ensure a stable performance. Table 1: Hyper-parameter candidates used for grid search. In this subsection, we explain the evaluation metrics in details.


AI Mismatches: Identifying Potential Algorithmic Harms Before AI Development

Saxena, Devansh, Jung, Ji-Youn, Forlizzi, Jodi, Holstein, Kenneth, Zimmerman, John

arXiv.org Artificial Intelligence

AI systems are often introduced with high expectations, yet many fail to deliver, resulting in unintended harm and missed opportunities for benefit. We frequently observe significant "AI Mismatches", where the system's actual performance falls short of what is needed to ensure safety and co-create value. These mismatches are particularly difficult to address once development is underway, highlighting the need for early-stage intervention. Navigating complex, multi-dimensional risk factors that contribute to AI Mismatches is a persistent challenge. To address it, we propose an AI Mismatch approach to anticipate and mitigate risks early on, focusing on the gap between realistic model performance and required task performance. Through an analysis of 774 AI cases, we extracted a set of critical factors, which informed the development of seven matrices that map the relationships between these factors and highlight high-risk areas. Through case studies, we demonstrate how our approach can help reduce risks in AI development.


MetaGL: Evaluation-Free Selection of Graph Learning Models via Meta-Learning

Park, Namyong, Rossi, Ryan, Ahmed, Nesreen, Faloutsos, Christos

arXiv.org Artificial Intelligence

Given a graph learning task, such as link prediction, on a new graph, how can we select the best method as well as its hyperparameters (collectively called a model) without having to train or evaluate any model on the new graph? Model selection for graph learning has been largely ad hoc. A typical approach has been to apply popular methods to new datasets, but this is often suboptimal. On the other hand, systematically comparing models on the new graph quickly becomes too costly, or even impractical. In this work, we develop the first meta-learning approach for evaluation-free graph learning model selection, called MetaGL, which utilizes the prior performances of existing methods on various benchmark graph datasets to automatically select an effective model for the new graph, without any model training or evaluations. To quantify similarities across a wide variety of graphs, we introduce specialized meta-graph features that capture the structural characteristics of a graph. Then we design G-M network, which represents the relations among graphs and models, and develop a graph-based meta-learner operating on this G-M network, which estimates the relevance of each model to different graphs. Extensive experiments show that using MetaGL to select a model for the new graph greatly outperforms several existing meta-learning techniques tailored for graph learning model selection (up to 47% better), while being extremely fast at test time (~1 sec).


Machine Learning Performance Metrics

#artificialintelligence

In Machine Learning Performance Metrics numbers have an important story to tell. They rely on you to give them a voice. Regardless of you are a non-technical person in sales, marketing or operations. Or whether you belong to a technical background such as data science, engineering or development. It is equally important for everyone to understand how performance metrics work for machine learning.